Use GCS-backed IO manager for durable asset I/O#76
Merged
Conversation
Materializing a downstream asset on its own (combine or geoserver) failed: FileNotFoundError: .../storage/<pid>/sources/<src> DagsterExecutionLoadInputError: loading input "src_bor" of "<pid>" Dagster+ serverless defaults to the filesystem IO manager backed by the run's ephemeral /tmp, so a source asset's output isn't available to a combine/geoserver step unless both run in the same run. Configure a GCSPickleIOManager (gcs_prefix=dagster-io on the products bucket) so asset outputs persist and downstream assets load their inputs across runs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Your pull request is automatically being deployed to Dagster Cloud.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes the runtime failure when a downstream asset is materialized on its own:
Cause: Dagster+ serverless uses the default filesystem IO manager backed by the run's ephemeral
/tmp. A combine asset (or the geoserver leaf) can only load its source inputs if those source assets ran in the same run. Materializing the combine/geoserver asset alone — or relying on a prior run's outputs — can't find them.Fix: configure
GCSPickleIOManager(prefixdagster-ioon the existing products bucket) as the defaultio_manager. Asset outputs now persist to GCS, so downstream assets load their inputs across runs.Scheduled runs (which materialize the full
geoserver.upstream()graph) already worked; this makes ad-hoc/partial materialization work too and makes outputs durable.Verified: definitions load — 45 assets, 6 schedules, io_manager = GCSPickleIOManager.
🤖 Generated with Claude Code